Picture for Ryo Hachiuma

Ryo Hachiuma

DVSM: Decoder-only View Synthesis Model Done Right

Add code
May 28, 2026
Viaarxiv icon

Agent Explorative Policy Optimization for Multimodal Agentic Reasoning

Add code
May 27, 2026
Viaarxiv icon

Learning from Synthetic Data via Provenance-Based Input Gradient Guidance

Add code
Apr 03, 2026
Viaarxiv icon

Interpretable Debiasing of Vision-Language Models for Social Fairness

Add code
Feb 27, 2026
Viaarxiv icon

VIOLA: Towards Video In-Context Learning with Minimal Annotations

Add code
Jan 22, 2026
Viaarxiv icon

Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception

Add code
Jan 14, 2026
Viaarxiv icon

Masking Teacher and Reinforcing Student for Distilling Vision-Language Models

Add code
Dec 23, 2025
Viaarxiv icon

4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation

Add code
Dec 22, 2025
Figure 1 for 4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation
Figure 2 for 4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation
Figure 3 for 4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation
Figure 4 for 4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation
Viaarxiv icon

Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in

Add code
Dec 16, 2025
Viaarxiv icon

Unified Reinforcement and Imitation Learning for Vision-Language Models

Add code
Oct 22, 2025
Figure 1 for Unified Reinforcement and Imitation Learning for Vision-Language Models
Figure 2 for Unified Reinforcement and Imitation Learning for Vision-Language Models
Figure 3 for Unified Reinforcement and Imitation Learning for Vision-Language Models
Figure 4 for Unified Reinforcement and Imitation Learning for Vision-Language Models
Viaarxiv icon